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For almost 20 .years, the Portland public sqhools have' 
aaantained a testing program anchored in locally constructed tests. 
The disttict»s newly developed tests are called goal-referenced ' \ 
tests," a version of criterion-referenced tests; that leet the 
district's purposes because they reflect what well-qualified teaphers 
in the district belflfep should be taught and. m.easured. In 
cbast'ructing the tesW, curriculum personnel select goSls they 
believe the test should' measure; teachers develop iteis that measure 
those goals; the tests are given a tpial administration and items are 
analyzed; the item?, formats, and directions are revise*; and the 
tests are administered ft>r their intended use. The Basch test scaling 
procedure that involves the idectif icaition of an equal-interval scale 
of difficulty for a'given set of items based on inforiation about 
item diff^sijity and total test pexformance for the group testes is 
used.-^The .procedure- yields information on item dif f iculty "a^d an - 
estimate of the ability of individuals and groups tested. This 
permits establishaeDt of a scale that is independent of the norminq 
population and allows for the creatio|i of itea pools. (Author/IBT) 
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evelopinents in Goal Based Measurement in' the Portland Public Schools 



Dr. Victor Doherty 
Portland Public Schools 



For almost 20 yea^^the Por;tland Public Schools has maintained a 
testing program anchored inNiocally conetr^k^ted tests and local norms 
using standard scores 'Vith a .meanNX)f 50 and a standard deviation of 10, 
Under the leadership of Geprge tngeb^X^d Dearv Forbes this program 
serv^d^he' purposes df^reaearch and evaluation in the school system much 

^ bettLer, in aur judgment, tHat\^publisher ' s tes^with their elusive national 
norms and^ non-equaV^^S^val^deri^ed sc;ores . Ir^^^ a tribute both to the 
leadership of measurement people in "Eh'^e^-^U.s^trix^ the Superlnten- ^ 

^* dents under whom they served that such a program surviWd'the pressures" 
ttiat constantly urged return to politically^ttractive u^^s of sta.ndard• 
ized tests and grade -equivalent scores\ 

■\, 

Events pf the pastN^ive years, however, have imposed on the Portland 
schools a need* to advance i\s program to yet anoth^X^ S.tage of development; 

^\ * \ 

one which we hope will represenjt important progress in, public school 

mea s ur eme n t • , 'V \ / 

■ \ ' -\/ 

\n event in 1970-71 that helped precipitate change wa^^the crea;?^on 
of three sub-districts having considerable autonomy in planning and evalV 
uation. A Central Evaluation Department was created to monLtb^ progress 




in each of the three new sub-districts, and to audit 
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evaluations that 




\ -2- . 

ach was staffed to perform. Two areas chose to continue testing programs 
similar to the former district-wide program; -t;he third chose to use nation- 
ally standardized tests with grade-level equivalents. In this third area, 
different test forms were also administered to students of different 
ability. ^ . 

/ 

Problems in central auditing of sUb-district evaluations were 'pose^ 
immediately by the different testing programs' of the three areas.- Tp. pre- 
serve some elements of "a common measurement base, city-wide administration 
of math and reading tests formerly used was required for grades 4 and 8, 
along with TAP math and reading tests for grade 11, 

This much knowledge of the background of Portland's testing program \ 

\ 

is important in understanding what follows. 
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* Shortly after the Centr^ Evaluation Department was created in 1970-71, 

.it beqame evident that "program evaluation of the type deaired^in Portland 

simply could not occur without well-defined learning outcomes in the 

various courses of study. Behavioral objectives, with *their extreme spec- 

ificity^aad stated conditions of performance did not seem to^^b^ a viable 

type of outcome statemen^t for use in planning and evaluating instructional. 

* » 

programs. So we set about to create a type of statement that served these 

/ ^ \ - . < - • 

purposes ef fectivetly, and came up ^d.th something called a "course goal," 

which is simply a coWise , clear statement of desired learning. 

3 
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* The Central^aluation Department organized a Chree-county effort 
to develop, this new tool for planning and evaluation. Over a four-year 
period, comprehensive, carelully classified sets of course goals were 
produced ift 12 fields of study, \ 



0 
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Tire tri-county goal-defining effort was intended to place a resource 
in the hands of teachers and administrators that would permit thetn to 
select rather than create statements of desired learning. This seemed 
necessary since attempts of school systems throughout the country to have 
teachers create such statements seemed to produce results of insufficient 
quality for successful planning and evaluation. The 12 course-goal col- 
lections ^ created by. the tri-county cooperative effort ndw^provide abase 
for planning and measurement that is comprehensive and of acceptable 
quality, ^ - ^ I • 

Through the worte of FtBdTorster, we now have the ability to print 

* ' ' * «, 

out item results for each goal represented in each test developed for use 

* *■ 

in the system. In developing new tests, the first step is for curriculum 
personnel to select the goals they believe the test should measure, ^The 
second step is for te^achers to develop items that measure those goalg. 
In doing this, teachers follow a procedure for goal domain development 
published by the Northwest Evaluation-Association, The third step is 
trial( administration of test modules and item analysis. The fourth is ^ 
revision of items, test formats, and directions based on item analysis 



and^experience with th^ trial administration; and fifth is kdininistra- 
tion ,,6f the test for its intended use in the system. At this point, 
information on extent of goal attainment is printed^ for each student 
and for each class for teacher use. » 



The resultant tests are what we 'term goal - referenced tests . They 
ari 'simply another version of what are sometimes called objec tive -ref- ^ 
erenced or criterion-re ferenced^' teVtrs, Their superiority f9i: ourv 
purposes deriyes from the fact that t4iey reflect what well qu^lTtied 
teachers in the District believe should be taugh.t and measured. 

I have not yet touched on a second development Chat reinforces . 
this goal -referencing capability to open new testing potentials in 
Portland. The Ras^ch test.scaling procedure, promo ted^>y;3en Wright 
-and others in this country, involves the identtfica^^ion of^n equal- 
interval scale of difficulty for a given set of item^^lj^eH upon in- 
formation about item difficulty and total test performance for the group 
tested. The Rasch procedure attempts to defi^ne item difficulty with the 
/greatest precision possible on the basis of trial item administrations. 
The procedure can yield information on item difficulties for any test 
admir^istere^ to any group; and also yields an estimate of the ability 
of individuals and groups tested, ^ ^ ' ^1 



What advantages does this method have over conventional test norming^ 
^and scaling procedQres? first, it permits establishment of a scale that 

r 

is indepe ndent of a nornjirig population . Given conditions of curricular 
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validWy and good test construction, it appears that item calibrations 
based /on adminis taring a^ tes t to 200 oru tnpre students are very stable 
and qdite robust with tegard to the achievement ^leval of the norm group, 



A second advantage, of, the Rasch, and one o^ great importance to ^ 
us J is the ability to create item pools through the administration of k 
latge number of different tests^ linked to one another by overlapping 




items. By obtaining di^fficulty values (calibrations) of the linking or 
overlapping it;ems, and' then adjusting the calibrations from one test* to 
thesOther, it is\possiblje to place all items' in all tests on a difficulty 
continuum. The scaS^^e thbs created makes it possible to secure comparable 
performance estimates^^fot various groups attempting any i^^ms in the^ pool 




To understand the in^portance of this procedure it is necessary to 
return to our goal-based isVstem of test construction* One^ otCTte-spersis- 
tent objections raised by! teachers to measurement and especially' to use 
of standardized tests, is the difficulty of finding or constructing tests 
that correspond to the oui^comes sought by partici^lar teachei^s- ,Jhat objec- 
tion can be^overcome by a [system that (l)^p§rTnits teachers U select the 
goals the/"^isi^^^to have 'mejasured, (2) has calibrated items jrelating to 
those goals frotn' which total-score estimates can be derivedj that are 
statistically comparable tb those derived from any other se t. of^^items 
administered from the same \pool. . 
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The comtined goal-referencing and Rasch scaling capabilities, if 
all assumptions, techniques, and procedures prove valid, should satisfy 
these two conditions . 



Hfving^ common metric for a large pool of items not only makes it 



possib^ to secure cSm^ai^ble measures for different groups working on 
different goals; it also makes possible the administration of simple tests 
to less able students and more difficult tests to more able students while 

retaining score-comparin^'^suad/score -averaging capabilities. 

> 

Portland's test development work of" the past two years has made in- - 
creasing use of the capabilities just described. Following is a brief 
review of tests developed or under development in the school system: 



Ik 
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City-wide reading and 
math tests', grades 4,8 



Credit by examination 
tests for high school 



High school math 
couifse examinations 



Elementary "level tests" 
in mathematics and 
reading, grades 3-8 



\ 



Items in existing tests have been referenced 
to tri-county goals; goal attainment reports 
now provided teachers, Tesfs llasch analyzed, - 
but Rasch scaled scores not yet used. ■ 

Tests in 9th^grade math, science and Xan^uaga 
arts have been created by teachers ^^ho wrote 

. items to measure goals selected by tfeachtr 
committees. Tests Rasch analyzed; but Rasch .-v 

■ scale scores not yet used. 

Over 150 forty-five minute^modules with over- 
lapping items have been created far measuring 
mid-term and second term achievement in 19 
high school math courses. Mid-term tests 
Rasch analyzed; second term tests to be Rasch 
analyzed this summer. Program should be in 
standard use by mid-term, 1977, with Rasch " ■ 
scaled score reporting, possibly supplemented - 
by standard scores (mean 50, S.D. 10). Re- 
• suits reported by goal as well as by total 
score. 

Over 2000 items Rasch calibrated in elemen- 
tary reading and math through administration 
of modul^es in Portland and cooperating schoyol 
systems. Level tes'ts' being constructed for 
math (Fall, 1976) and reading (Fall, 1977). 
When completed, should be possible to admin- 
-ister^short test, appropriate to "functioning * 
level" of student and to secure more reliable 
measure than from longer tests formerly .usedi 
^Scores from any of these level tests should 1 
be cbmparable to those from any other, and 
statistically, combinabl^e. Results reported 
by goal as well as by total score. 
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